9,245 research outputs found

    Adaptive text mining: Inferring structure from sequences

    Get PDF
    Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively

    Digital libraries for the developing world

    Get PDF
    Digital libraries (DLs) are the killer app for information technology in developing countries. Priorities here include health, agriculture, nutrition, hygiene, sanitation, and safe drinking water. Computers are not a priority, but simple, reliable access to targeted information meeting these basic needs certainly is. DLs can assist human development by providing a non-commercial mechanism for distributing humanitarian information on topics such as health, agriculture, nutrition, hygiene, sanitation, and water supply. Many other areas, ranging from disaster relief to medical education, from the preservation and propagation of indigenous culture to educational material that addresses specific community problems, also benefit from new methods of information distribution

    Customizing digital library interfaces with Greenstone

    Get PDF
    Digital libraries are organized, focused collections of information. They are focused on a particular topic or theme—and good digital libraries will articulate the principles governing what is included. They are organized to make information accessible in particular, well-defined, ways—and good ones will include a description of how the information is organized (Lesk, 1997). The Greenstone digital library software is intended to help users construct simple collections of information very quickly. Indeed, only a few minutes of the user's time are needed to set up a collection based on a standard design and initiate the building process. Collections may be large—some comprise Gbytes of text; millions of documents. Furthermore, even larger volumes of information may be associated with a collection—typically audio, image, and video, with textual metadata. Once initiated, the mechanical process of building the collection may take from a few moments for a tiny collection to several hours for a multi-Gbyte one—perhaps even a day if it involves many different full-text indexes

    Creating and customizing digital library collections with the Greenstone Librarian Interface

    Get PDF
    The Greenstone digital library software is a comprehensive system for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet. This paper describes how digital library collections can be created and customized with the new Greenstone Librarian Interface. Its basic features allow users to add documents and metadata to collections, create new collections whose structure mirrors existing ones, and build collections and put them in place so for users to view. More advanced users can design and customize new collection structures. At the most advanced level, the Librarian Interface gives expert users interactive access to the full power of Greenstone, which could formerly be tapped only by running Perl scripts manually

    Classification

    Get PDF
    In Classification learning, an algorithm is presented with a set of classified examples or ‘‘instances’’ from which it is expected to infer a way of classifying unseen instances into one of several ‘‘classes’’. Instances have a set of features or ‘‘attributes’’ whose values define that particular instance. Numeric prediction, or ‘‘regression,’’ is a variant of classification learning in which the class attribute is numeric rather than categorical. Classification learning is sometimes called supervised because the method operates under supervision by being provided with the actual outcome for each of the training instances. This contrasts with Data clustering (see entry Data Clustering), where the classes are not given, and with Association learning (see entry Association Learning), which seeks any association – not just one that predicts the class

    The Development and Usage of the Greenstone Digital Library Software

    Get PDF
    The Greenstone software has helped spread the practical impact of digital library technology throughout the world-particularly in developing countries. This article reviews the project’s origins, usage, and the development of support mechanisms for Greenstone users. We begin with a brief summary of salient aspects of this open source software package and its user population. Next we describe how its international, humanitarian focus arose. We then review the special requirements imposed by the conditions that prevail in developing courtiers. Finally we discuss efforts to establish regional support organizations for Greenstone in India and Africa

    Categories of holomorphic line bundles on higher dimensional noncommutative complex tori

    Get PDF
    We construct explicitly noncommutative deformations of categories of holomorphic line bundles over higher dimensional tori. Our basic tools are Heisenberg modules over noncommutative tori and complex/holomorphic structures on them introduced by A. Schwarz. We obtain differential graded (DG) categories as full subcategories of curved DG categories of Heisenberg modules over the complex noncommutative tori. Also, we present the explicit composition formula of morphisms, which in fact depends on the noncommutativity.Comment: 28 page

    Thesaurus-based index term extraction for agricultural documents

    Get PDF
    This paper describes a new algorithm for automatically extracting index terms from documents relating to the domain of agriculture. The domain-specific Agrovoc thesaurus developed by the FAO is used both as a controlled vocabulary and as a knowledge base for semantic matching. The automatically assigned terms are evaluated against a manually indexed 200-item sample of the FAO’s document repository, and the performance of the new algorithm is compared with a state-of-the-art system for keyphrase extraction

    Strong-Electroweak Unification at About 4 TeV

    Get PDF
    It is shown how an SU(3)NSU(3)^{N} nonsupersymmetric quiver gauge theory can accommodate the standard model with three chiral families and unify all of SU(3)CSU(3)_C, SU(2)LSU(2)_L and U(1)YU(1)_Y couplings with high accuracy at one unique scale estimated as M4M \simeq 4 TeV.Comment: 3 pages LaTeX. Typos corrected. Text and references adde

    Topological Charge Membranes in 2D and 4D Gauge Theory

    Full text link
    Local topological charge structure in the 2D CP(N-1) sigma models is studied using the overlap Dirac operator. Long-range coherence of topological charge along locally 1D regions in 2D space-time is observed. We discuss the connection between these results and the recent discovery of coherent 3D sheets of topological charge in 4D QCD. In both cases, coherent regions of topological charge form along surfaces of approximmate codimension 1.Comment: Lattice2004(topology
    corecore